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1.   Introduction 

This  paper  discusses  some  aspects  of  the  relationship  between 
sequential  circuits  and  combinational  circuits.   Circuit  design  in  both 
areas  has  been  studied  extensively  in  the  past.   Past  studies  have  included 
efforts  to  reduce  the  time  and  gates  required  to  compute  various  functions. 
This  paper  establishes  upper  bounds  on  time  and  gates,  and  also  provides 
a  systematic  procedure  for  transforming  a  sequential  circuit  design  into 
a  combinational  circuit. 

The  upper  bounds  on  time  vhich  we  prove  are  quite  good,  relative 
to  the  best  known  lower  bounds  in  most  cases.   We  also  give  gate  bounds, 
which  have  often  eluded  detailed  analysis  in  the  past.   Our  gate  bounds 
seem  quite  sharp  relative  to  the  actual  numbers  found  in  real  logic  design 
examples . 

The  algorithms  we  have  for  transforming  sequential  circuit  designs 
into  combinational  ones  yield  circuits  which  meet  the  above-mentioned  gate 
and  time  bounds.   In  this  sense,  we  present  a  uniform  design  procedure  for 
the  realization  of  any  linear  sequential  machine  in  combinational  circuit 
form.   The  advantage  of  this  is  that  one  can  often  specify  the  behavior  of 
some  desired  function  quite  easily  as  a  sequential  circuit.   It  is  somewhat 
more  difficult  to  translate  such  a  specification  into  a  faster  combinational 
circuit  form.   A  classic  example  is  the  ease  with  which  a  bit  serial  adder 
is  specified  in  sequential  form.   On  the  other  hand,  the  design  of 
combinational  parallel  adders  (with  various  lookahead  schemes)  occupied 
many  logic  designers  for  some  years  in  the  1950s.   The  automatic  design  of 
a  fast  parallel  combinational  adder  derived  from  a  bit  serial  specification 
is  one  example  of  the  use  of  our  method. 


Not  all  interesting  logic  design  problems  are  presented  in  a 
sequential  form  that  is  linear.   As  ve  shall  see  later,  multiplication  is 
an  example.   While  some  nonlinear  cases  can  he  linearized  mathematically, 
ve  shall  discuss  another  approach.   We  will  show  how  nonlinear  logic 
circuits  can  he  used  to  remove  the  nonlinearity  in  the  sequential  specifica- 
tion.  Then,  in  terms  of  elements  which  contain  the  nonlinearities,  we 
obtain  a  linear  system  at  a  higher  level.   Our  method  can  then  be  applied 
in  a  straightforward  way. 

An  important  question  in  modern,  practical  logic  design  is  what 
to  put  in  one  integrated  circuit  package  and  then  how  to  synthesize  useful 
circuits  using  such  packages.   One  of  the  methods  we  present  deals  with 
what  can  be  regarded  as  logic  design  at  the  integrated  circuit  package 
level.   We  show  what  logic  should  be  contained  in  a  package  and  then  give 
a  method  for  interconnecting  packages.   Again  our  discussion  is  centered 
on  transforming  given  sequential  logic  specifications  into  combinational 
logic  in  the  form  of  packages.   This  is  closely  related  to  the  subject  of 
the  previous  paragraph  in  the  sense  that  nonlinear  logic  functions  can 
often  be  hidden  in  integrated  circuit  packages,  leaving  us  with  a  linear 
problem  at  a  higher  level. 

Throughout  the  paper  we  illustrate  our  methods  with 
examples  giving  gate  and  time  bound  coefficients  for  several  practically 
useful  logic  design  problems  including  adders,  multipliers,  and  ones'  position 
counters . 

The  techniques  described  in  this  paper  are  variations  on  our 
earlier  efforts  to  design  fast  parallel  operation  computers  [  1  ]     [ 2  ] . 


There  our  basic  units  were  adders  and  multipliers  which  operated  on  whole 
floating-point  numbers,  while  here  we  are  dealing  with  logic  design  at 
lower  levels.   In  this  paper  we  deal  with  operations  on  bits  and  bytes 
at  the  gate  and  integrated  circuit  package  level.   It  is  important  to 
notice  that  mathematically,  precisely  the  same  ideas  and  algorithms  are 
used  at  all  levels;  only  the  details  of  the  technology  change.   Thus  we 
feel  that  in  attempts  to  automate  the  design  of  general  purpose  or 
special  purpose  machines,  one  set  of  underlying  ideas  may  be  of  general 
use. 

The  following  definitions  and  assumptions  will  hold  throughout 
the  paper.   An  atom  is  a  constant  or  variable  denoted  by  a  lower  case 
letter.   In  some  parts  of  the  paper  we  will  deal  with  Boolean  atoms 
(which  have  value  0  or  l)  and  in  other  parts  we  will  deal  with  arithmetic 
atoms  (which  represent  binary  numbers).   A  dyadic  Boolean  operator  is 
either  a  logical  or  or  a  logical  and.   A  dyadic  arithmetic  operator  is 
either  an  addition  or  multiplication  operator.   We  denote  these  by  +  and  • 
respectively,  in  either  case.   The  context  will  make  our  meaning  clear 
when  necessary,  and  in  some  cases  the  same  result  will  hold  in  either  the 
Boolean  or  the  arithmetic  case. 

Except  as  noted  in  the  paper,  we  assume  that  all  Boolean  nots 
and  arithmetic  subtractions  are  distributed  down  to  the  level  of  atoms. 
In  the  arithmetic  case,  this  is  discussed  in  [  3],  while  in  the  Boolean 
case  a  similar  procedure  may  be  carried  out  using  DeMorgan's  Laws.   We  do 
this  without  loss  of  generality  to  simplify  our  discussion. 

An  expression  (Boolean  or  arithmetic)  is  a  well-formed  string 


consisting  of  atoms  and  operators  and  is  denoted  by  an  upper  case  letter. 
We  write  E<e>,  for  example,  to  denote  an  expression  E  containing  e  atoms. 
The  distinction  "between  Boolean  and  arithmetic  atoms  and  expressions  will 
be  clear  by  the  context  of  our  discussion. 

We  assume  throughout  the  paper  that  and,  or  and  not  gates  each 
have  one  gate  delay  of  unit  time.   We  assume  that  all  and  and  or_  gates  have 
fan-in  2  and  fan-out  f .   By  dealing  with  such  stylized  gates  we  are  able 
to  compare  various  designs  in  elementary  terms.   If  one  assumes  more 
complex  gates  with  higher  fan-ins,  our  gate  and  time  upper  bounds  can 
obviously  be  reduced,  in  general.   Another  way  in  which  the  coefficients 
in  our  bounds  can  be  uniformly  improved  is  by  ignoring  the  time  required 
to  complement  signals.   Many  circuit  families  have  gates  in  which  both 
true  and  complemented  outputs  are  available  with  no  time  or  cost  penalty. 
To  make  our  bounds  conservative  and  as  widely  useful  as  possible,  we  have 
not  taken  advantage  of  any  such  features . 

We  emphasize  the  fact  that  in  practice  fan-out  is  usually  greater 
than  fan-in,  but  fan-out  delays  may  be  nonnegligible.   We  account  for  fan- 
out  delays  and  gates  in  all  of  our  bounds.   Thus  our  results  represent  a 
more  refined  treatment  than  is  usually  found  in  abstract  bounds  of  this 
type  which  often  ignore  fan-out  limitations. 

We  use  the  notation  T  [E]  to  denote  the  number  of  gate  delays 

(i 

in  a  circuit  which  implements  expression  E  using  G  gates.   Similarly,  we 
use  the  notation  T  [E]  to  denote  the  number  of  processor  delays  required  to 

compute  E  using  P  processors. 

Throughout  the  paper  we  use  log  x  to  denote  log^x. 


2.   Combinational  Circuits 

In  this  section  we  discuss  gate  and  time  "bounds  for  combinational 
logic  circuits.   We  give  bounds  for  gates  with  fan-out  f  and  fan-in  2. 
After  giving  some  elementary  fan-out  and  combinational  fan-in  bounds  we 
present  an  overall  circuit  bound.   This  is  expressed  in  terms  of  the  number 
of  inputs  and  outputs,  and  could,  for  example,  be  used  to  bound  the  gates 
and  time  needed  for  an  integrated  circuit  package. 

Throughout  the  paper,  we  assume  that  signals  appear  from  some 
external  source  and  are  returned  to  some  external  destination  after  our 
operations  on  them.   Effectively  we  are  ignoring  registers  from  which 
signals  come  and  to  which  they  are  returned.   Thus  we  can  count  gates  and 
time  delays  and  compose  them  in  a  uniform  way,  without  ad  hoc  accounting 
procedures  at  the  source  and  destination  of  our  signals. 

Our  first  lemma  concerns  the  fan-out  of  signals  and  will  be  used 
extensively  later. 

Lemma  I  An  e  way  fan-out  can  be  accomplished  using  gates  with 

fan-out  of  f  >  2  in 

TG  1  flogf  e]  -  1 

with 

°<f^  • 

Proof  The  first  stage  of  fan-out  is  accomplished  by  either  an 

external  source  (which  we  ignore)  or  a  previous  combinational  gate  which 
will  be  counted  elsewhere.   The  destination  of  our  signals  is  either  external 


(hence  ignored)  or  combinational  gates  which  are  accounted,  for  elsewhere. 

This  is  illustrated  in  Figure  1. 

Thus  we  can  fan-out  to  f  places  with  zero  gates,  to  f-l+f  places 

with  one  gate,  to  f-2+2f  places  with  two  gates,  and  to  f-G+Gf  places  with 

G  gates.   Since  we  want  e  <_  f-G+Gf,  we  see  that  e  <_  f-G+Gf  <  e+f-1.   Thus 
e-1 


we  have  G  < 


f-1  ' 


2 

We  can  fan-out  to  f  places  in  zero  time, to  f  places  in  1  time 

3  k 

unit,  to  f  places  in  2  time  units,  and  to  f  places  in  k-1  time  units.   It 

follows  that  for  e  <_  f  <  fe  we  have  k  <  1  +  log   e,  so  k  -  1  <  log  e. 

But  k  -  1  =  T  so  the  theorem  is  proved. 
G 

Q.E.D. 


Next,  we  bound  the  gates  and  time  in  the  combinational  part  of  any  logic 
circuit . 

Lemma  2   [  3  ],  [  U  ] 

Any  Boolean  expression  E<e>  of  e  atoms  can  be  realized  using 
gates  of  fan-in  2  in 

o 

1   +  2d  +  ("log  el     if  d  <  7  log  e 


T  [E<e>]  < 


JUlog  el  otherwise, 


with 


3 
e-1  if  d  <  —  log  e 

G[E<e>l  << 

»2(e-l)  otherwise, 

where  d  is  the  depth  of  parenthesis  nesting  in  E. 


Stage   Stage 
1       2 


Signal 
Destinations 


Figure  1 


Signal  Fan-out 


The  proof  of  this  lemma  for  d  <  —  log  e  is  found  in  [ 3  ] .   In 

most  practical  expressions,  the  depth  of  parenthesis  nesting  is  small,  so 

3 
this  provides  the  best  bound.   However,  if  d  >  -  log  e,  we  use  the  second 

half  of  the  lemma  which  is  proved  in  [ k   ] s  where  it  is  also  shown  that  this 

may  be  extended  to  T,jE<e>]  <  31og  n  with  G[E<e>]  <  2.5e.   We  have  found 

that  for  practical  purposes  a  low  gate  bound  is  more  important  than  a  low 

time  bound,  however.   In  much  of  the  following  we  will  use  Lemma  2,  assuming 

3 
for  simplicity  that  d  <  —  log  e. 

Next  we  define  a  combinational  circuit  and  then  give  overall 

gate  and  time  bounds  for  such  circuits. 

Definition  1 

A  combinational  circuit  C<r,s,e,n,d>  is  defined  by 

1)  A  set  of  inputs  x.  ,  1  <_  i  <_  r. 

2)  A  set  of  outputs  y  ,  1  <_   j  <_  s ,  where  y.  is 

d  J 

defined  by  an  output  expression  E.<e.>  of  e.  atoms  (representing  inputs  or 
complements  of  inputs)  and  with  parenthesis  nesting  depth  d.. 

J 

3)  e  =  max{e.}  is  the  maximum  number  of  atoms  contained  in  E .  ,  1  <_  j  j 

s   J  J 

M   n  =  I  e.  is  the  total  number  of  atoms  in  all  E.  s. 
j=l  J  J 

5)   d  =  max{d.}  is  the  maximum  parenthesis  nesting  depth  among 
all  of  the  output  expressions  E.. 

It  is  clear  that  n  >_  s ,  and  we  assume  that  n  >_  r,  i.e.,  each  input  is  used  in 
at  least  one  output  expression. 


Theorem  1 

Any  combinational  circuit  C<r,s,e,n,d>  can  be  realized  using  gates 
of  fan-in  2  and  fan-out  f  in 

TG  <  [log  el  +  2(d  +  riogfnl) 

with 

G  <  (l+^)n  +  (l-"jrj-)r  "  a  . 


Proof 

First,  consider  the  fan-out  of  the  inputs.   Let  the  i-th  input  be 
used  e.  times  in  output  expressions.   Since  we  may  need  to  complement  the 
input,  we  first  fan  it  out  to  e.  +  1  places  (the  extra  one  for  complementation) 
By  Lemma  1  we  need  (since  we  assume  each  input  atom  is  used  at  least  once) 

TG1  <  riogf(n-s+l)l  -  1  <  riog  nl  -  1 

with 

r  e. 
Gl  <   I 


f-1   f-1 
1=1 

Now  we  can  complement  each  input  variable  in 

T   =1 
G2 

with  G2  <_  r, 

and  fan  the  complemented  variable  out,  each  to  at  most  e.  places.   Thus  we  have 


TG3  <  riogf  (n-s+l)l  -  1  <  [logfnl  -  1 


r  e.-l 

G3  <  Z     — —  =  £z^- 
J  -  .  _   f-1   f-1 
i=l 
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Next,  we  consider  the  fan-in  of  the  atoms  to  form  the  output 

variables  according  to  the  output  expressions  E.<e.>.   By  Lemma  2  we 

J   J 

have  (assuming  d.  <  —  log  e.) 


"GU 


with 


r 


Gk  < 


1  +  2d  +  flog  e] 
n+log  el 


Z  e .  -1  =  n  -  s 


if  d  <  f  log  e  ,  1  <  J  <  s 

(J       *-  J 


otherwise 


if  d  <  f  log  e  ,  1  <  j  <  s 


V 


I   2(e.-l)  =  2(n-s)    otherwise. 


Thus  (assuming  d.  <  —  log  e.,  1  £  j  £  s)  we  have  a  total  of 


T_  <  Tlog  el  +  2(d  +  riog-nl) 

b  I 


with 


=  (l+f-^)n+  (l-^)r  -s. 

Q.E.D. 

Example  1 

Suppose  we  have  a  16  pin  integrated  circuit  package  which  contains 
only  combinational  logic.   Assume  we  can  use  7  pins  for  inputs  and  7  pins 
for  outputs,  i.e.,  r  =  s  =  "J.      Assume  that  we  have  an  average  Of  k   atoms 
per  output  expression  so  n  =  k.7  =   28,  the  maximum  number  of  atoms  per 
expression  is  e  =  8,  and  d  =  2.   Thus  a  typical  output  expression  may  be  of 
the  form 

y.  =  (x1+x2)*(x-3+x5)  . 


11 


Let  us  use  circuits  with  fan-in  2  and  fan-out  8.   Now  for  any 
possible  combinational  logic  with  the  above  characteristics,  a  package 
can  be  designed  such  that  the  total  package  time  in  gate  delays  is 

TQ  £  flog  el  +  2(d  +  [logfnl) 

=  [log  81  +  2(2+  Flogg281)  =  3  +  2(U)  =  11  . 

The  total  number  of  gates  in  any  such  package  is  at  most 
G  <  (1+|)28  +  (1-4)7  -  T  =  35  • 


Example  2 

Suppose  we  have  a  h&   pin  package  for  large-scale  integrated 
circuits.   Let  r  =  s  =  23,  n  =  6*23  =  138,  e  =  l6,  d  =  3,  and  f  =  8.   Now 
any  possible  combinational  circuit  can  be  realized  with 

TQ  <  Tlog  161  +  2(3  +riogQ  1381)  =  k  +   2(6)  =  16 

and 

G  <  (1+|)138  +  (1-4)23  -  23  =  ITT  . 

Thus  we  see  that  for  realistic  assumptions  about  packages  and 
logical  expressions,  we  obtain  gate  and  time  bounds  that  are  of  practical 
interest. 


12 


3.   Sequential  Circuits 

In  this  section  we  discuss  methods  of  transforming  sequential 
circuits  into  combinational  ones  and  give  time  "bounds  and  component  "bounds 
on  the  resulting  circuits. 

Definition  2 

A  sequential  circuit  S<r ,s ,e,n,d,m>  is  defined  at  time  t  by 

1)  A  set  of  inputs  x.(t),  1  <_  i  <_  r  =  r  +  r  .   We  call  the 

x .  ( t ) ,  1  <  i  <  r.  ,  the  external  inputs,  and  the  x.(t),r  +  1  1.  i  !l  r  ? 

the  feedback  inputs. 

2)  A  set  of  outputs  y.(t),  1  <  j  <_s  =  s  +  s,  where  for  any 

logical  functions  f., 

J 

y  (t)  =  f  [x  (t),  x  (t),  ...,  x  (t)] 

=  f  [a^Ct),  ...,  xr  (t),  ys  +1(t-m1),  .  ..,  yg(t-mr  )] 

as  shown  in  Figure  2  .   We  call  the  y.(t),  1  <  j  £  s  ,  the  external  outputs 

and  the  y.(t),  s  +l£j<_s,  the  feedback  outputs .   Note  that  r  >  s  . 

Each  output  is  defined  by  an  output  expression  E,<e.>  of  e.  atoms  (repre- 

J   J      J 

senting  inputs  or  their  complements).   Expression  E.  has  parenthesis  nesting 

J 

depth  d . . 
J 

3)  e  =  {el5e  }  where 

e,  =   max  {e.}   and  e  =    max  (e.} 
i^l5!  J  s^l^^s  J 


Clock 


Inputs 


1 


Outputs 
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Sequential  Circuit 


lU 


h)      n  =  n  +  n    where 


Sl 


n_  =  Z  e .   and  n^  =    Z   e .  . 
1   j-l  °       2   j=s1+l  J 

5)  d  =  {d  ,dp}   where 

d  =   max  {d.}   and   d  =    max   {d.} 
l<J<s1  J  S-j+l^s   J 

6)  A  set  of  delays  m.  ,  1  <_  i  £  r  ,  where 

m  =  max  m.   is  the  maximum  delay, 
i 


Definition  3 

A  linear  sequential  circuit  is  a  sequential  circuit  with  outputs 

y.(t).  sn  +  1  <  i  <  s,  of  the  form 
l    '1     —   — 

y.(t)  =  fi[x1(t ),..., xr  (t),  ys  +1(t-m1),...,ys(t-mr  )] 

=  Ci  +  aH  ys1+l(t-ml)  +  '••  +  aik  ys(t-mr  )  • 
1  r2       2 


wher 


e  the  c  and  a   ,  1  <  i  <  ro  >  are  derived  from  any  logical  functions  of 
i      1.1    —   —  d 


the  inputs  x. (t),  ...,  x   (t) 
1  r 


Definition  k 


An  m-th  order  linear  recurrence  system  of  n  equations  R<n,m>,  is 

defined  by 

x.  =  0     for  i  <  0, 
i  — 

and 

i-1 

x.=c.+   E   a..x.      for  1  <  i  <  n  , 

i    i    .  .    ij   j  _   — 

j=i-m 


15 


where  1  <  m  <  n,  and  the  c.  and  a. .  are  constants.   We  assume  that  n  and  m 
-  1      ij 

are  powers  of  2.   If  either  is  not,  we  choose  the  next  higher  power  of  2 

and  apply  our  hounds  and  algorithms  directly.   The  solution  of  this  recurrence 

is  the  set  {x.  |l  <_  i  <_  n}  . 

The  following  lemma  forms  the  basis  of  much  of  our  subsequent  work. 
We  will  use  it  to  count  gates  as  well  as  higher  level  components  such  as 
integrated  circuit  packages  or  whole  processors.   Thus  we  state  the  lemma  in 
terms  of  operations  6  which  can  be  interpreted  as  logical  or  and  and  or  as 
arithmetic  addition  and  multiplication.   When  we  deal  with  fan-out,  at  the 
gate  level  0  corresponds  to  gates  while  at  the  processor  level  it  refers  to 
registers  or  demultiplexers. 

Lemma  3 

Any  m-th  order  linear  recurrence  R<n,m>  can  be  solved  in 

/  5  1  12 

T  <_  (g-  +  log  m  +  —  logfn)log  n  -  -^(log  m  +  log  m) 

with 

9  <  ||m2(2-Hf-3Y)  +  m(l+~)"|nlog  n 

+  [m3(l+f-iT)-m2(2+-^Iy)  -  -^SLyJ  n  +  2m2  +  (^-)  log  n 

where   f  =  2^,  q  >_  1  . 

Proof 

Our  proof  follows  the  proof  of  Theorem  2  of  [2  ]  and  a  logical 
circuit  can  be  constructed  following  Algorithm  2  of  [  2].   First,  we  consider 
the  time  required.   The  computational  6  delays  follow  directly  from  the  time 
bound  for  solving  an  R<n,m>  system  in  Theorem  2  of  [ 2  ] .   Thus,  for  the  first 
part  of  our  time  bound,  we  have  from  Theorem  2  of  [?_} 
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1    2 
Tl  <_   (2+log  m)  log  n  -  —(log  m  +  log  m)  . 

To  complete  the  time  bound,  ve  must  consider  the  fan-out  time 
required  by  Theorem  2  of  [  2  J ■   Such  times  were  regarded  as  negligible 
compared  to  arithmetic  operation  times  in  [  2].   The  solution  of  an  R<n,m> 
system  is  generated  in  log  n  iterations.   It  may  be  seen  from  Figure  k   of  [2] 
that  on  iteration  i  =  log  k,  we  perform  at  most  (—  +  m  -  l)  way  fan-outs. 

Thus  the  fan-out  time  on  iteration  i  is  [log   (—  +  m  -  l)]  -  1,  by  Lemma  1. 

Summing  over  all  iterations,  for  k  =  2,  h,    8,  ...,  n,  we  have  (since 
f  =  2q  >  2), 

T2  <  ([logf  2f]  -  1)  +  ([logfUf]  -  l)  +  ...  +  ([logf(|+  m  -  1)]  -  l) 
and  grouping  terms,  we  get 


<  q(l  +  2  +  3  + 


log  n  -  log  2f  +  1 


) 


=  q(l  +  2  +  3  +  ...  +  fl0gfnl  -  1) 

<_   q(l  +  2  +  3  +  ...  +  logfn)  =  I  logfn(l  +  log  n) 

=  —  log  n(l  +  log  n)  . 

Thus  our  total  time  is  Tl  +  T2  or 

12  1 

T0  £  (2  +  log  m)  log  n  -  -^(log  m  +  log  m)  +  —  log  n(l  +  log  n) 

5  1  1  ? 

=  (—  +  log  m  +  —  logfn)  log  n  -  p"(log  m  +  log  m)  . 

Next,  we  consider  the  number  of  0  operations  required.   In  the  proof 
of  Theorem  2  [  2],  we  gave  expressions  for  counting  the  number  of  processors 
in  evaluating  an  R<n,m>  system.   Since  a  tree  of  n  leaves  has  at  most  2n  -  1 


IT 


nodes,  we  can  upper  bound  the  number  of  6  operations  by  doubling  the  processor 
count  from  Theorem  2  of  [  2 ] •   We  choose  the  worst  expression  for  the 
processor  count  on  iteration  i  =  log  k,  namely,  expression  (2  )  [2],  the 
2m  <_  2  <  n  case,  sum  over  all  iterations,  for  keK={2,U,8,...,n}  , 
and  multiply  by  2  to  bound  the  6  operations.   Thus,  ignoring  fan-out  for  the 
moment ,  we  have  a  total  of 


91  <  2E  (fl  +  (£  -  2)(m  +  1) 
"  keK  L 


r  m 


I   J  +m(f  -  1) 
3=1 


r  m 


+  (m  +  1. 


Ij  +  m(f  -  m) 


}  , 


where 


K  =  {2,U,8,...,n}  . 


By  rearranging  terms,  we  have 


=  2Z  { 
keK 


1  +(|-  l)(m  +  1) 


m 


3=1 


2)  m(m  +  1)(|-  1)  +  m(|  -  l) 


+  m(m  +  1)(|-  m)} 


Wow  summing  on  j  gives 


=  21  {[f(m+l)  -  m]  aJfal  +  (S.  ,  2)  m(m  +  l)(| 

keK  *  ^      k  2 


1)  +  f^(m  +  2) 


(m  +  m  +  m)} 


3    2 
m 


_  oy  r  m  k   (m  -  m)n   (m  +  m)    3mJ  - 
-  dL   I-  -   + —  -  +  — n  -  

keK   ^        ^   k      ^  2 


-  2m 


} 


<  [-  m  (2n  -  2)  +  (mJ  -  m)n  +  (m  +  m)nlog  n] 


p  O        p 

=  (m  +  m)nlog  n  +  (m  -  2m  -  m)n  +  2m 
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As  is  discussed  in  [  2 ] »  the  trees  we  are  evaluating  are  of  a 
special  form  with  •  operations  at  the  leaf  nodes  and  +  operations  elsewhere. 
The  above  sum  can  be  used  as  an  exact  count  of  *  operations.   But  since  the 
trees  are  somewhat  sparse,  a  more  refined  count  reduces  the  number  of  + 
operations.   Thus  our  factor  of  2  above  is  too  large.   By  a  straightforward 
but  long  argument  similar  to  the  above,  we  can  show  that  the  6  operation 
count  is  actually  bounded  by 

61  £  (m2  +  |)n  log  n  +  (m3  -  2m2)n  +  m(2m  -  l) 

which  we  use  in  the  statement  of  the  theorem. 

Now  we  consider  the  number  of  fan-out  9  operations  required.   It 

2  2 

follows  from  Theorem  2  [  2]  that  iteration  i  requires  (m  +  m)n/k  -  m 

fan-outs,  each  fanning  out  to  at  most  k/2  +  m  -  1  destinations.   Thus  the 

total  number  of  0  operations  can  be  computed  using  Lemma  1  as 

Q^UfimJn  z  I(|+m.  2)  -=L  E  (f+m  -  2)  ,  K=  {2.U.8,.. .  ,n) 

f-1   k£K  k  2  f_1  keK  d 

Summing ,  we  obtain 


92  < 


2 

(m  +m)n 

f-1 


log  n 


+  m  -  1 


m 


f-1 


2n-2 


+  (m-2)  log  n 


2 
m  +m 

2(f-l) 

m  +m 
2(f-l) 


n  log  n  + 


-5P3  2  2 

m  -m  -m     m  log  n  -  2m  log  n  -  m 


f-1 


n  - 


f-1 


2                3   2 
,  m  +m     ,      ,  m  -m  -m      2  .. 
<  .  1        i  n  log  n  +  — — ; n  +  t-tt  log  n 


f-1 


f-1 


Note  that  at  the  gate  level  these  9  operations  are  gates  and  are 
comparable  to  the  gates  counted  in  81.   At  the  integrated  circuit  or  processor 
level,  these  9  operations  correspond  to  registers  or  demultiplexers  which  are 
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generally  less  costly  than  the  6  operations  of  01.   But  to  be  conservative 
we  count  each  of  them  as  one  0  operation.   Thus  our  total  0  operation  count 
is  0  =  01  +  02,  so 


2   m  ,  m  +m 
m     +  2  +  2Tf^lT 


n  log  n 


3  2 

i  3   ~  2  ,  m  -m  -m 

|m  "  2m  +   (f-l)  J 


2     2 
n  +  2m  +  (j~r)    log  n 


=  |[  m2(2  +  —-)  +  m(l  +  ~j-)J   n  log  n 

+[  m3(l  +  _i_)  _  m2(2  +  ^  .  .^  J  n  +  (|_}  lQg  n  +  ^2   ^ 

Q.E.D. 
The  following  corollary  follows  directly  from  Lemma  3  and  covers 
a  case  of  wide  practical  interest. 

Corollary  1       Any  first  order  linear  recurrence  R<n,l>  can  be  solved  in 


Te  £  |(5  +  logfn)  log  n 


with 


0  1  |(3  +  ~[)  n  log  n  -  (1  +  -^-)  n  +  (~-0  log  n  +  2   . 

Thus  we  see  that  for  large  fan-outs,  we  can  solve  any  R<n,l>  system 
in  T.,  =  0(log  n)  with  G  =  0(n  log  n)  . 


Example  3 


The  R<8,1>   system 


c.    =  0    ,  i   <   0 

l  — 


and 


c.    =  y.   +  x. »c.    _      ,      l<i<8 
11  i      l-l  —       — 

can  be  used  to  describe  the  carry  generation  in  a  binary  adder  (c.f.,  Theorem  3) 

A  circuit  to  generate  the  ^  follows  directly  from  Lemma  3  and  Algorithm  2  of 
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of  [  2]  "by  interpreting  •   as  and,  and  +  as  or.   The  circuit  is  shown  in 
Figure  3,  assuming  f  =  5. 

Next  we  give  a  corollary  of  Lemma  3  which  shows  the  ranges  of 
time  and  gates  for  an  R<n,m>  system  as  fan-out  ranges  from  2  to  an  arbitrarily 
high  number. 

Corollary  2  Any  m-th  order  linear  recurrence  R<n,m>  can  be  solved 

in 
(2  +  log  m)log  n  -  x(log  m  +  log  m)  <  T  <  (|  +  log  m  +  -log  n)log  n  -  -(log  m+loj 


with 


(m2+|)n  log  n  +  (m3-2m2)n  +  0(m2  log  n)  <_  0  <_  j   3m2+2m  n  log  n  +  [2m  -3m  -  ml  n 


+  21og  n  +  2m 


Proof         The  lower  bounds  follow  directly  from  Tn-,  and  91  in  the  proof 

of  Lemma  3,  assuming  that  fan-out  time  and  6  count  are  negligible.   The  upper 
bounds  follow  from  Lemma  3  by  setting  f  =  2. 

Thus  we  see  that  for  large  fan-outs  we  can  solve  an  R<n,m>  system 

0 
in  T  =  0(log  m  log  n)  with  G  =  0(m  n  log  n)  . 

u 


Definition  5 

The  k  step  operation  of  a  sequential  circuit  S  is  defined  by  k  pairs 
of  vectors 

[(x1(t),  ...,  xr  (t)),  (y1(t),  ...,  ys(t)] 

for  1  <_  t  <_  k.   These  vectors  represent  the  external  inputs  and  outputs  of 
S  at  each  time  step  t. 
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Theorem  2 

The  k  step  operation  of  any  linear  sequential  circuit  S<r,s  ,e,n,d,m> 
can  be  realized  by  a  combinational  circuit  such  that  for  large  k 

TQ  <  |(logf  s2k)(log  s2k)  +  0(log  k) 
with 


G  <  |(m+l)2  s23(2  +  ~j-)  klog  s2k  +  0(k) 


Proof         Our  proof  is  in  three  parts.   First,  we  set  up  the  A  and  b 
arrays  of  Definition  h.      Then  we  evaluate  the  resulting  recurrence  system. 
Finally,  we  generate  the  external  outputs. 

The  A  matrix  and  b  vector  components  can  be  generated  from  the 
external  inputs  at  any  of  the  k  time  steps.   Thus  we  have  a  total  of  kr 
inputs  to  combinational  circuit  C.  which  produces  as  outputs  the  components 


of  A  and  b.   Since  a  total  of  np  atoms  are  used  in  generating  all" of 


the 


feedback  outputs  of  S,  there  are  at  most  kn,.  non-zero  components  in  A  and  b. 
The  maximum  number  of  atoms  in  any  expression  is  eQ,  the  total  number  of 
atoms  is  knp  and  the  maximum  parenthesis  depth  is  d  ,  so  we  can  set  up  the 
A  and  b  arrays  with 

C1<kr1,  kn2,  e2>  kn2,  dp>  . 

Next  we  solve  the  linear  recurrence  R<n,m>  .   There  are  a  total 

of  ks  outputs  in  k  time  steps  so  n  =  ks  .   Since  the  maximum  delay  is  m 

i 

time  steps  with  s  outputs  per  time  step,  the  bandwidth  of  this  system  is 

at  most  (m+l)s   -  1.   Thus  we  have  a  recurrence  of  the  form  R<ks  ,  (m+l)s2-l>  . 

Finally,  we  generate  the  external  outputs  with  combinational  circuit 
C  .   There  are  a  total  of  kr  inputs  and  ks   external  outputs.   The  maximum 
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number  of  atoms  in  any  output  expression  is  e  ,  the  total  number  of  atoms 

in  all  output  expressions  is  kn  and  the  maximum  depth  is  d  ,  so  we  have 

C2<kr,  ks  ,  e1$  ki^,  d1>   . 

Now  we  bound  the  gates  and  time  required  for  each  of  these.   By 
Theorem  1 ,  for  CL  we  have 


T 
with 


Gl  1  \lQZ   e2J  +  2(d2  +  P^f  kn2l 


Gl  <  (1  +^)kn2  +  (1  -  71^)^  -  kn2 


=  (i  -7II)kri  +  7=1  kn2 


By  Lemma  3,  we  can  solve  R<ks  ,  (m+l)s   -  1>  in 

TG2  <  (f  +  log(m+l)s2  +  |  logf  (m+l)s2)log  ks2 


with 


G2  <  |  |"(m+l)2  s22(2  +  —-)  +  (m+l)s2(l+~j-)l  ks2  log  ks2 

+  (1  +  ■—■)    (m+1)-3  s23  ks2  +  0((m+l)2  s22  log  ks2) 


By  Theorem  1  we  have  for  C_ 


T03  ^  [log  ell  +  2(dl  +  fXogf  **lV 

with 

G3  <  (1  +  j~)   kn1  +  (1  -  £—-)  kr  -  ks 


Combining  the  above  we  have  a  total  time  of 
TQ  <  hog  ej  +  flog  e2|  +  2(d1  +  dg  +  ^logf  taxj  +  flogf  knj) 

+  (f  +  log(m  +  X)s2  +  2  l0gf  s2k^l0g  S2k  ' 
Thus,  for  a  fixed  circuit,  as  we  increase  the  number  of  operating 
time  steps  k,  we  have 

TQ  <  -(logf  s2k)(log  s2k)  +  0(log  k)  . 

The  total  gate  count  is 
G  <  |[(m+l)2  s22(2  +  ~^)    +    (m+l)s2(l  +  ^-)J  s^   log  s2k 

+  ^-fir^i*^  +  ni  +  ?irn-si]k 

+  (1  +  fTj-Jdn  +  I)3  s2h   k  +  0((n  +  l)2  s22  log  ks2)  . 


Thus,  for  any  fixed  circuit,  as  k  increases  we  have 
G  <  |[(m+l)2  s22(2+f-^-)  +  (m+1)  B2(l+~-)l  sg  klog  k  s^  +   0(k) 
or  (since  m  >_  1  and  f  >_  2) 

G  <  ~(m+l)2  s23(2+^j)  klog  s2k  +  0(k)  . 

Q.E.D, 

Now  we  turn  to  the  consideration  of  higher  level  components  as  our 
basic  circuit  elements.   We  will  define  two  package  types  which  could  be 
implemented  directly  using  integrated  circuits.   Our  time  bounds  will  be 
expressed  in  package  delays.   The  techniques  of  the  previous  section  could 
be  used  to  design  such  packages.   Our  component  bounds  will  be  expressed  in 
terms  of  the  total  number  of  packages  required. 
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Our  strategy  in  this  case  is  to  decompose  a  linear  recurrence 
system  R<n,m>  into  a  number  of  small  identical  systems.   These  smaller 
systems  can  be  solved  directly  by  interconnecting  the  integrated  circuit 
packages  we  specify.   An  algorithm  to  decompose  a  large  R<n,m>  system 
has  been  given  in  [  2],  [  5]  for  arithmetic  operations.   Here  we  present 
the  algorithm  for  logic  design  and  consider  only  the  R<n,l>  case  for  the 
sake  of  easy  explanation.   The  R<n,l>  case  is  by  far  the  most  common  one 
occurring  in  practical  logic  design,  and  our  method  can  be  extended  to 
larger  m  in  a  straightforward  way. 

Definition  6 

We  define  two  types  of  integrated  circuit  packages. 

a)  ICL.  n.  is  a  package  which  accepts  input  atoms  c.  for 

R<n,l>  c  c  l    . 

1  <_  i   <_  n,    and  a.    for  2  <_  i   <_  n.      It   computes  the  outputs  x.    for  1  <  i  <  n 

according  to  the  recurrence  relation 
x0=0 

x.    =   c.    +  a.    x.    _    . 
ill      l-l 

For  signal  input  and  output  it  has  a  total  number  of  pins  equal  to  3n  -  1 
times  the  number  of  bits  per  atom. 

b)  IC_T    is  a  package  which  may  accept  input  atoms  a.  and  b.  for 

1  £  i  ±  n,  and  c  and  d.   It  computes  the  outputs  x.  for  1  <_  i  <  n,  according  to 

x.  =  v.w.  +  y. z.  , 
i    li    -11 


where   either 


i)      v.    =a.,w.    =c,y.    =b.    and   z.    =d,   l<i<n 
l  li  i  i  i  —       — 
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or 

ii)   v.  =  a.,  v.  =  t>.  ,  y.  =  a.   and  z.  =  "b.  ,  1  <  i  <  n. 
1    ii    ii    l        i    i    —   — 

For  signal  input  and  output  it  has  a  total  number  of  pins  of  at  most  3n  +  2 
times  the  number  of  bits  per  atom.   In  general,  we  denote  the  total  number  of 
integrated  circuits  in  some  logical  circuit  by  IC. 

Example  h 

An  IC   ,     has  a  total  of  3*^-1  =  11  signal  pins  if  it  is  to  solve 

a  Boolean  recurrence.   Suppose  we  are  summing  the  bits  in  a  l6-bit  word  and 
will  produce  a  log  l6  =  h   bit  result.   Then  h   bits  are  required  per  atom  and 
an  arithmetic  IC   ,     to  solve  this  problem  would  need  hk   signal  pins.   An 

IC     for  Boolean  operations  requires  3*3+2  =  11  signal  pins.   An  arithmetic 

package  for  handling  h   bit  numbers  would  need  a  total  of  UU  signal  pins. 

The  following  algorithm  is  adapted  from  [5]  (c.f.  Ch .  k) .      It  solves 
any  R<n,l>  system  by  partitioning  it  into  smaller  systems. 

Algorithm  1       Any  given  first- order  linear  recurrence 
R<n,l>:   x  =  0 

x.  =  c.  +  a.  x.  . ,  1  <  i  <  n 

111   l-l    —   — 

can  be  solved  as  follows . 


Step  1 


f  i ) 

,)   For  any  h  >  2,  compute  —  independent  recurrence  systems  Z    , 

n 


1  <:  J  ':  7"j  defined  as  follows 
h 
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Z(J):   z0(^  =  0  , 


..(j}  =  c.(j)  +  a.(j)  z.  n(j)  ,  1<  i  <£, 
111     i-I       —   —  n 


where 


c  (J)  -  c 

ci    "  Ci+(J-I)h  ' 

i       l+tj-ljh  . 


Id)   Compute  (|T-l)  independent  recurrence  systems  ^    5  2  <_  j  £  jj-  , 


defined  as  follows. 

Y(J):  y <j)  =1  , 


r  (j)  =  a  (})  (J)  !,!<„. 

1  1       ^1-1  —     — 


From  this  step  we  ohtain  h  elements  of  the  solution  of  the  original 

system,  i.e.,  x.  =  z.    for  1  <  i  <  h  . 

li         —   — 


Step  2 

From  the  results  of  Step  1,  compute  the  following  recurrence  system 

Zh   =  ° 

z(j)  =  (J)        (j)     (j-D      1<:  .  ..n 

h      h     J\         h        —  °   —  h 

From  this  step  we  obtain  another  (—"-l)  elements  of  the  solution, 

h 

i.e.,  x.,  =  z^'      for  2  <  j  <  ~  . 

jh    h  —  °   —   h 

Step  3 

From  the  results  of  Steps  1  and  2,  compute  the  remaining  elements 


28 


of  the  solution  using  the  following  n  -  —  -  (h-l)  independent  expressions 


-  Z(J)  +V(J)  Jj-D 
Xi+(j-l)h  "  Zi 


for  1  <_  i  <  h  -  1  and  2  <_  j  <_  —  . 


r>s 


Lemma  h 


Any  first-order  linear  recurrence  R<n,l>  can  be  solved  in  time 


(   logn  _   } 
IC  -    log  h    ; 

using  a  total  package  count  of 


IC  <  6£  +  k  ^&-£  -  7 
—  h     log  h 


with  package  types  ICL,..  ...  and  ICTT.n  _.  for  h  >  2, 

R<h,l>       U<h-1>       — 


Proof 

It  follows  directly  from  the  above  algorithm  that  we  need  one 

ICL,,  _.  type  package  for  each  Z    and  Y    in  Step  la  and  lb.   This  results 
K<n , i> 


in  (2 


-l)  packages.   In  Step  3  we  use  ( 


-l)  packages  of  type  IC 


U<h-1>' 


corresponding  to  Definition  6b,  part  i.   We  can  treat  Step  2  as  a  new 


R< 


,  1>  system  and  apply  the  same  algorithm  recursively  to  solve  this  system. 


This  implies  that  we  reduce  the  size  of  the  original  system  from  n  to  less  than 


or  equal  to  h,  following  the  sequence  n'  =  n,  — 


n 
h 

* 

n 
h 

1 

h 

,  h,  and 


finally  use  one  extra  IC       package  to  solve  the  residual  system.   Hence, 

n<n  ,-L'> 


for  each  iteration  we  need   (2 


-l)  packages  of  type  IC„  .  ,  >  and  ( 


-1) 


packages  of  type  IC..^.,  , .  .   Since  at  most  - — °— —  -  1  iterations  are  required, 
J *  U<h-1>  log  h 


we  have  for  IC_..,  n.  type  package  a  total  of  IC  =  (2 
K<n , i> 


-1)  +  (2 


-1  +  • 
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2<*HM 


2<V  £♦!)-! 

h 


h 


••>  +  3(^f-2)  +  i, 


and  since  h  >   2, 


<  iiS.  +  312EJ1  _  5 
-  h         Jlog  h 


Similarly,  for  IC       type  packages,  we  have  a  total  of 


«-<[§H  ♦<[[!! 


n|  1 

h 


1)  +  ... 


<  (H)  +  (n    1}   (n^   1    1} 

-  V    v^2   h'    \3       ^2   h; 
h         h    h 


^^^•••'♦Ci-1*-1 


<  2n  +  log  n  _  2 
—  h    log  h 


The  time  hound  is  ohtained  hy  the  fact  that  all  packages  in  the  same  step 


per  iteration  are  operating  in  parallel 


Q.E.D, 


Example   5 


x.    =   0 

i 


The  R<l6,l>   system 
for   i   <   0 


and 


x.    =  c.    +  a.    x.    ,  for  1  <   i  <  l6    , 

ill      l-l  —       — 

can  be  solved  hy  the  circuit  of  Figure  h   which  follows  directly  from  Algorithm  1 

with  h  =  h .   The  packages  marked  R  represent  ICR<<  ,>  types  and  those  marked 

U  represent  IC 

For  use  in  a  later  application,  we  now  consider  a  special  case  of 


an  R<n,l>  system.   Let  a.  =1,  for  all  i,  in  Algorithm  1.   In  this  case  we 
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fn'l 

need  not  perform  step  l"b.   So  for  each  iteration,  only   r~i  type  ^^ 

|  n  I        n^n  j.L'* 

packages  are  required.   Also,  note  that  all  Z    are  computed  in  Step  la  by 

merely  summing  atoms.   Since  Steps  2  and  3  require  only  multiplication  by  the 

y's  generated  in  Step  1,  which  are  l's,  no  multiplication  is  required  in  any 

package.   From  this  we  have 

Corollary  3       Any  R<n,l>  system  of  the  form 

x0=  0 

x.  =  c.  +  x.  _ ,  1  <  i  <  n 
l    l    l-l    —   — 

can  be  solved  in  time 


IC  -  v  log  h    ; 


using  a  total  package  count  of 

IC  <  l£  +  3^-^  -  k 
—  h    log  h 
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k .  Applications 

In  this  section  we  will  study  several  practical  logic  design 
problems.   The  methods  of  section  3  will  "be  used  to  derive  time  and 
component  bounds.   We  will  consider  binary  addition  and  ones'  position 
counting  in  detail.   In  less  detail  we  will  consider  binary  multiplication, 
digital  filtering  and  a  control  problem. 

Definition  7 

By  the  addition  of  two  n  digit  binary  numbers  a  =  a   ...  a 

and  b  =  b   ...  b^  we  mean  the  generation  of  sum  digits  s  =  s   . . .  sn  and 
n      1  n      1 

carry  digit  c  ,  defined  as  follows. 

We  write 

s.    =    (a.b.+a.b.)    c.    .,    +    (a.b.+a.b.)    c.    1  (l) 

l  11      11        l-l  11      ii        i-l 

where  1  <  i  <  n  and  c^  =  0,  such  that  s.  =  1  iff  just  one  or  all  three  of 
—   —        0  l         ° 

a.,  b.  and  c.  _,  are  equal  to  1.   Also  we  write 
l   i      i-l 

c.  =  a.b.  +  (a.+b. )  c.  ,  (2) 

l    ii     ii   i-l 

where  1  <  i  <  n  and  c  =  0  ,  such  that  c.  =  1  iff  any  two  or  all  three 

of  a.,  b.  and  c.  n  are  equal  to  1.   Now  let 
l   i      i-l 

x.    =  a.    +  b.  (3) 

ill 


and 


y.    =  a.b.    .  (h) 

i  ii 


If  we  write 


d.=a.b.+a.b.=(a.+b.)+a.b.=x.+y.  (5) 

l  1111  ii  ii  l  l 
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then  Equation  1  can  be  rewritten  as 

s.  =  d.  c.  .  +  d.  c".  .  (6) 

l    i   l-l    i   i-I 


and  Equation  2  can  be  rewritten  as 

yi  +  xi  Ci-1    1iiin  (T) 

i  =  0  . 


Our  first  result  concerns  binary  addition  using  gates  as  components. 


Theorem  3 


Two  n  =  2  ,  t  >_  0,  digit  binary  numbers  can  be  added  in 


TG  l|<5+lcgfn)  log  n  +  h 

with 

G  -  (2  +  fll5  n  log  n  +  (8  -  -^-)  n  +  {-£-)   log  n  +  2  . 

Proof       Our  proof  consists  of  three  parts. 

1)  To  generate  the  x.  and  y. ,  1  <  i  <  n,  from  a.  and  b.  by 

l      l    —   —         l      l 

Equations   3  and  h    ,   we  need  2n  gates  and  one  gate  delay,  so  T   =  1  and 

(j-L 

Gl  =  2n. 

2)  To  generate  the  s.  ,  1  <_  i  <_  n,  from  x.  ,  y.  ,  and  c.    using 

Equation  6,  we  refer  to  Figure  5-  A  total  of  7  gates  are  required  for 

each  s.,  for  a  total  of  7n  gates.   After  d.  and  c.  ,  are  available,  three 
l  i      i-I 

gate  delays  are  required.   It  will  be  seen  in  part  3  that  the  generation 

of  the  c,  1  <_  i  <_  n,  from  x.  and  y.  can  be  accomplished  in  21og  n  steps. 

So  for  n  >_  2  the  two  steps  required  to  generate  d.  from  x.  and  y.  are  no  more 

than  the  time  required  to  generate  c,  since  21og  2  =  2. 


fctH 


3U 


Figure  5 
Sum  Generation 

It  is  easy  to  verify  that  the  theorem  holds  for  n  =  1  by  a  direct  construction. 

Thus  we  have  T_0  =  3  with  G2  «:  Tn. 
G2 

3)   To  generate  the  c.  ,  1  <_  i  <_  n,  from  x.  and  y.  using  Equation  7, 

we  turn  to  Lemma  3.   Since  Equation  7  defines  an  R<n,l>  system,  it  follows 
immediately  from  Corollary  1  (c.f.,  Figure  3)  that 

TG3  -  |(5  +  loSfn)  lcg  n 


with 


with 


G3  < 


3+  1 


2   f-1 


1        ? 
n  log  n  -  (l  +  j^-)    n  +  — r  log  n  +  2  . 


Thus  we  have  from  parts  1,  2  and  3  a  total  of 


TG=l+3+|(5+  logfn)    log  n 


=  -(5  +  logfn)    log  n  +   k 


<   2n  +  7n  +    (|  +  ~-)    n  log  n  -    (l  +  — )    n  +  ~-  log  n  +   2 
-    (§  +  ^j)    n  log  n  +    (8  -  £^-)    n  +    (~j-)    log  n  +   2    . 


Q.E.D. 
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Next  we  consider  binary  addition  with  integrated  circuit  packages 
as  components . 


Theorem  h 


Two  n  =  2  ,  t  >  0,  digit  binary  numbers  can  be  added  in  time 


TJr  <  (2^^  +  1) 
IC  —   log  h 

using  a  total  package  count  of 

IC  <  <£  +  k±^   _  7 
—  h    log  h 

with  package  types  IC       and  IC     for  h  >  2. 


Proof       The  x.  and  y.  of  Definition  7  can  be  generated  in  one  package  delay 

using  2n/h  type  IC  packages.   The  carries  of  Equation  7  (c.f.,  Figure  h)    can  be 

generated  following  Lemma  k   in  TTO  <  (2r — ~r  -  l)  using  Gr-  +   h- — °— —  -  7 

IC  —   log  h  h    log  h 

packages.   Then  the  sum  bits  of  Equation  6  can  be  generated  in  one  package 

delay  using  —  packages  of  type  ICJT  following  Definition  6b,  part  ii. 

Summing  these  counts  proves  the  theorem. 

Q.E.D. 

Example  6 

Consider  the  problem  of  adding  two  32-bit  binary  numbers  using 
gates  with  fan-in  2  and  fan-out  8.   By  the  method  of  Theorem  3,  the  sum  can 
be  formed  in  at  most  21  gate  delays  since 

TG  <  |(5  +  logg  32)  log  32  +  It 

<i(§°)  5  +  i,  =  ioo  +  u<21  m 


36 
The  number  of  gates  required  is  at  most 

G  <  (f  +  h    32-5  +  (8  -  j)    32  +  2-5  +  2  <  f|-  -160  +  jp-32  +  12  =  527 

■ 

On  the  other  hand,  if  integrated  circuit  packages  are  available 
which  handle  8  bits  at  a  time,  h  =  8,  we  have  the  following.   The  total 
package  count  is 

IC  19  f^  ^ff  -  7  <  37  i 

and  the  number  of  package  delays  is 

t    <  (2^r+ 1)  <  5  . 

1C  —    log  o 

The  next  application  we  study  is  a  ones1  position  counter.   This 
is  the  problem  of  determining  the  number  of  ones  to  the  right  (say)  of  each 
bit  in  a  word.   The  problem  arises  in  various  real  world  contexts,  particularly 
in  control  design.   We  discuss  the  problem  because  of  its  practical  interest 
and  also  because  it  serves  as  an  interesting  case  standing  between  binary 
addition  and  binary  multiplication. 

As  we  saw  above,  given  the  theoretical  background  of  section  3  on 
solving  linear  recurrences,  the  design  of  a  binary  adder  is  straightforward. 
The  ones'  position  counter  is  not  as  easy,  however.   When  formulated  at  the 
bit  level,  this  problem  leads  to  a  nonlinear  recurrence  which  cannot  be 
solved  by  the  methods  of  section  3.   As  we  shall  see  later,  binary  multiplication 
also  shares  this  property. 

The  technique  we  use  to  solve  such  logic  design  problems  with  bit 
level  nonlinearities,  is  to  reformulate  them  at  a  higher  level  where  they 
are  in  fact  linear.   The  nonlinearity  is  thus  hidden  inside  a  more  complex 
bit  level  operator.   In  practical  terms,  this  can  be  accomplished  by  building 
a  nonlinear  circuit  element  and  then  combining  these  in  linear  ways  according  to 
the  techniques  of  section  3.   Putting  such  nonlinearities  inside  integrated 
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circuit  packages  is  an  attractive  possibility. 

Definition  8 

The  ones'  position  counting  of  an  n  bit  word  a  =  a    ...   a^a. 
—  n        d  i 

is  the  generation  of  a  count  vector  z  =  (z  ,  ...,  z.,  )  such  that  z.  is  the 

n        1  l 

sura  of  the  number  of  ones  in  bits  a.   ...   an  .   Thus,  the  ones'  position 

l        1 

count  of  a  =  10110110  is  the  vector  z  =  ( 5 ,U,U ,3,2,2,1,0) . 

Following  Definition  8,  we  can  easily  generate  the  z  vector  using 

the  following  arithmetic  R<n,l>  system 

zo  ■  ° 

z .  =  a.  +  x.  _  ,   1  <  i  <  n  . 

i    i    l-l      —   — 

Thus  by  using  log  n  bit  adders  (c.f.,  Theorem  h)    as  components  we  can  solve 
the  system  in  0(log  n)  adder  steps  (c.f.,  Corollary  l),  so 

Tn  =  0(log  n)  0(log  log  n)  =  0(log  n  log  log  n)  . 

Li 

Since  each  adder  has  0(log  n  log  log  n)  gates,  we  have  a  total  gate  count  of 

G  =  ((n  log  n)*0(log  n  log  log  n) 
=  0(n  log  n  log  log  n)  . 

By  formulating  this  problem  in  terms  of  integrated  circuit  packages 

we  can  use  Corollary  3  to  achieve  a  better  gate  count  than  the  above.   Thus, 

to  solve  an  arithmetic  R<n,l>  system  we  need  IC  =  0(— )  .   Each  IC       package 

n  R<n,-L> 

is  used  to  count  l's,  so  inside  each  package  we  can  use  the  method  of 

Corollary  1  to  solve  an  arithmetic  R<h,l>  system.   Thus  from  Corollary  1, 

we  have  6  =  0(h  log  h) .   Now  let  us  choose  h  =  log  n  so  6  =  0(log  n  log  log  n)  , 

Each  such  9  processor  is  used  to  add  log  n  bit  numbers.  Thus  we  use 

Theorem  3  to  count  the  gates  as  G  =  0(log  n  log  log  n).  Multiplying  these 


38 


three  levels  of  components  we  obtain  a  total  gate  count  of 

n  2 

G  =  0(— *h  log  h'log  n*log  log  n)  =  0(n  log  n*(log  log  n)  )  . 

Similarly,  we  obtain  the  time.   By  Corollary  3  we  have  0(log  n/log  h) 
package  delays.   Each  package  delay  is  Tfl  =  O(log  h)  from  Corollary  1.   And 
the  add  time  by  Theorem  3  is  0(log  log  n) .   Hence,  our  total  time  in  gate 
delays  is 

TQ  =  0(log  n'log  log  n)  . 

Thus  we  see  that  the  time  is  the  same  but  we  have  reduced  the  gate 
count  over  the  straightforward  method.   We  can  summarize  this  as 


Theorem  5 


The  ones'  position  count  of  an  n  =  2  ,  t  >  0,  bit  word  can  be 


generated  in 


with 


T  =  0(log  n'log  log  n) 


2 
G  =  0(n*log  n*(log  log  n)  ) 


We  note  that  the  gate  count  can  be  further  improved  by  using  more 
types  of  packages.   For  example,  if  we  let  h  =  log  log  n  in  Step  1  of  Algorithm  1 
and  h  =  log  n  in  Step  2  (see  proof  of  Lemma  h) ,  we  can  obtain  a  solution  in 


with 


TG  =  0(log  n-log  log  n) 


G  =  0(n*(log  log  n)   (log  log  log  n)) 


By  using  even  more  package  types,  even  better  gate  bounds  are  possible. 

To  obtain  a  package  bound,  Corollary  3  can  be  applied  directly. 
The  following  example  illustrates  this. 

Example  7 

Suppose  we  have  packages  of  types  IC   «     and  IC     as  illustrated 
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in  Example  h.      Then  by  Corollary  3,  the  ones'  position  count  of  an  n-vector, 
with  n  =  l6,  can  be  done  with  at  most   2(t—  +  2(— )  -  2  =  10  IC   . 

packages  and   2(^r0  +  (|)  -  2  =  8  ICU<3>  packages.   Actually,  by  direct 

application  of  Algorithm  1,  it  can  be  easily  found  that  we  need  just  five 
ICL.,  n^  packages  and  three  ICTT.^_  packages.   The  following  table  shows 

the  package  count  for  some  practical  values  of  n. 


package      ^"^^^^^^ 

16 

I 

1 

32   1  6k 

j 

Bound 

icr<1ki> 

10 

19 

36 

ICU<3> 

8 

IT 

33 

Actual 



ICR<U,1> 

5 

11 

21 

ICU<3> 

3 

8 

18 

Table  1.   Ones'  Position  Count 
Finally,  let  us  consider  a  bit  level  formulation  of  this  problem. 
Following  Definition  8,  let  z. .  be  the  j-th,  1  <  j  <_  1  +  log  n,  bit  of  z. . 

We  can  imagine  solving  the  problem  using  an  array  of  half  adders  such  that 
the  half  adder  in  position  (i,j)  is  described  by  (  (  +  )  denotes  exclusive  or) 

ij  "  Zi-l,j  v-/  Ci,j-1  Eq.  8 

c..=z.  _.»c.  .n  Eq.9 

ij    1-1, j    i,J-l 

where  c.  _  =  a.  and  s„  ,  =0.   Notice  that  at  the  bit  level,  this  is  a  non- 
1,0    1      0,1 

linear  recurrence  and  cannot  be  solved  by  the  methods  of  section  3. 

If  we  use  half-adders  as  components,  it  is  easy  to  see  that  the 

problem  can  be  solved  with  G  =  0(n  log  n)  or  T  =  0(n).   This  gate  count  is 

G 

comparable  to  the  best  shown  above,  but  it  uses  much  more  time. 
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Next,  we  turn  to  bounds  for  binary  number  multipliers. 

Definition  9         By  the  multiplication  of  two  n  digit  binary  numbers 

a  =  a   .  . .  a_  and  b  =  b   .  .  .  b_,  ,  we  mean  the  generation  of  2n  product  digits 
n      1  n      1 

P  =  P2n.  . . .  Pl  . 

First,  we  can  formulate  the  multiplication  problem  using  a 

straightforward  (row  parallel)  carry-save  adder  array.   If  we  let  x  correspond 

to  various  pairwise  ands  of  input  bits  [6  ] ,  we  obtain  a  coupled  recurrence 

system  of  the  form 

q. .  =  x  (+)   q.  .  .  (+)      c.  _  .  .  Eq.  10 

ij      ^-^         1-1.  J   ^-^    l-l,  J -1 

c.  =  x*q.  _  .  +  x'c.  .  .  .  +  q.  .  .  •  c.        .       Eq.  11 
ij      l-l, J      i-l,j-l    i-l,J    i-l, J-l 

Note  that  this  nonlinear  recurrence  system  is  a  generalization  of  Equations 

8  and  9  for  the  ones'  position  counter.   This  cannot  be  solved  by  the  methods 

•  2 
of  section  3,  however,  we  can  solve  it  directly  using  an  array  of  n  bit  level 

adders.   This  gives  a  circuit  which  can  multiply  two  n  bit  numbers  in 

T  =  0(n)  with  G  =  0(n  ).   Since  we  are  interested  in  faster  schemes,  we 
G 

will  now  turn  to  two  methods  to  solve  the  recurrence  of  Equations  10  and  11 
in  parallel. 

The  first  method  uses  a  tree  of  2n  bit  adders.   First,  we  form  a 
standard  array  of  partial  products.   Then  we  use  the  adder  tree  to  form  the  sum. 


Theorem  6 


Two  n  =  2  ,  t  >  0,  digit  binary  numbers  can  be  multiplied  in 


1  2    1 

TG  <.  ^6  +  log-  n)log  n  +  —{lk  +   log  n)log  n  +  log  n  +  1 

with 

2  s  2  ,        ,_    2x2^.        ,,„    2 


G  < 


(3  +  — )n  log  n  +  (20  +  — )n  -  3n  log  n  -  (IT  -  J-^)    n 


1+1 


2 
Proof         Our  proof  consists  of  two  parts:  l)  To  generate  the  n 

2 
partial  product  bits  a.  *b  ,  for  all  1  <_  i,j  _<  n  we  need  n  and  gates  and 

one  gate  delay.   Since  each  input  bit  is  fanned  out  to  n  places  we  have 
from  the  above  and  Lemma  1 , 


TQ1  <  1  +  logf  n 


with 


Gl  <  n2  +  2n(^)  <  n2(l  +  ^§-) 

2)  To  generate  the  sum  of  the  partial  products  we  need  an  adder  tree  of 

n  -  1  adders.   Each  adder  adds  2n  bit  numbers  and  the  height  of  the  tree  is 

log  n  adder  delays.   Thus,  by  Theorem  3,  we  have 


TG2  i  lo^ 


n'l|(5  +  logf  2n)log  2n  +  h 


1  2    1 

=  ^(6  +  log  n)log  n  +  —  (lk   +  log  n)  log  n 


with 


G2  =  (n  -  1) 


(f  +  -jT^)    2n  log  2n  +  (8  -  ^-)  2n  +  -~  log  2n  +  2 

<  (3  +  ~^-r)    n2  log  n  +  19n2  -  3n  log  n  -  (17  -  ~)    n  . 

Q.E.D. 

As  an  example  of  this  theorem,  consider  an  integrated  circuit 

package  as  follows. 

Example  8 

Using  gates  with  fan-out  8,  a  multiplier  of  two  U-bit  numbers  can 
be  implemented  with  a  delay  of 

TG  <  |(6)  log2  h  +   |(lU)  log  k  +   1  =  20 
using 

G  <  (3  +  j)    h2   log  h  +   (20  +  j)   k2  -   3-U-log  \  -   (17  -  h   k   =  3^1 
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The  above  result  is  somewhat  sloppy  because  we  considered  all 
inputs  to  the  tree  adder  to  "be  2n  "bit  numbers.   In  fact,  the  inputs  to 
the  first  level  of  adders  are  only  n  bit  numbers.   At  succeeding  levels 
they  are  of  length  n  +  2,  n  +  5,  ...,n+i  +  21"  -  2,  for  1  <_  i  <_  log  n  . 
By  a  careful  analysis  which  takes  this  increasing  length  into  account,  we 
can  improve  the  gate  count  in  Theorem  6  by  a  factor  between  2  and  3.   Thus, 
in  our  example  the  gate  count  could  actually  be  bounded  by  a  number  between 
115  and  170. 

The  method  above  is  the  best  method  we  know  (in  terms  of  time) 
for  numbers  with  few  digits.   For  long  numbers,  the  next  method  is  the  best 
we  know.   The  crossover  between  the  two  occurs  between  8  and  16  bits. 

The  next  method  is  a  variation  of  the  Wallace-Dadda  method  [T  ], 
[8  ].  It  consists  of  three  stages;  generation  of  partial  products,  column 
compression,  and  a  carry  propagate  adder.  This  differs  from  Wallace-Dadda 
only  in  the  last  stage. 

The  generation  of  partial  products  is  done  in  the  same  way  as  in 
Theorem  6.   For  an  upper  bound  on  time,  we  assume  a  three  to  two  column 

compression  scheme  [ 6  ] .   The  column  compression  for  two  n-bit  numbers  can  be 

2 
done  with  (n  -  Un  +  3)  full  adders  and  (n  -  l)  half  adders.   The  half  adder 

can  be  built  using  9  gates  (see  Theorem  3  with  n  =  l).   A  full  adder  of  2  bits 

can  be  easily  implemented  with  11  gates  by  a  scheme  similar  to  Figure  5« 

Thus  we  have  a  total  of 

G2  <  ll(n2  -  Un  +  3)  +  9(n  -  l)  =  lln2  -  35n  +  2\    . 

The  time  for  the  column  compression  is 
TG2  <  6log3/2n  =  lOlog  n 

since  each  full  adder  requires  at  most  6  gate  delays . 
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Finally,  to  propagate  the  carry,  we  use  a  2n  "bit  adder  (fewer 
bits  are  actually  needed) 

TQ3  <  (§  +  |logf  2n) 

as  we  used  in  Theorem  6.   This  leads  us  to 

Theorem  7 

Two  n  =  2  ,  t  ^_  0,  digit  binary  numbers  can  be  multiplied  in 

TG  -  ^13  +  2logf  n^  l0S  n  +  2logf  n  +  8 


with 


G  <  (12  +  ~j-)  n2  +  (3  +  ~j0  n  log  n  -  l6n  +  ^~  log  n  +  (26  +  —■) 


Example  9 

Using  gates  of  fan-in  8,  we  can  multiply  two  32-bit  numbers  in 


TQ  <  (13  +  |logg  32)  log  32  +  |logg  32  +  8  <  8l 


with 


G  <  (12  +  j)    102U  +  (3+|)  321og  32  -  512  +  ylog  32  +  26  +  y  <  12,62k 

Thus  far,  all  of  our  examples  have  dealt  with  R<n,l>  systems.   In 

practical  logic  design  linear  recurrence  systems  with  m  >  1  also  arise. 

First,  consider  the  following  logical  path  tracing  problem.   Suppose 

we  are  given  two  binary  words  a  =  a   ...  a,  and  b  =  b   . . .  bn  and  a  starting 

n      1  n      1 

bit,  either  a  or  b  .   We  wish  to  generate  a  word  e  =  e   ...  e  which  consists 

of  those  bits  on  a  path  through  a  and  b  chosen  as  follows.   First,  we  let  en 

be  the  given  starting  bit.   Then  we  choose  bits  in  the  same  word  until  we 
encounter  a  zero  in  (say)  bit  i,  which  causes  us  to  choose  bit  e.  _  from 

the  other  word.   We  continue  in  the  other  word  until  we  encounter  a  zero 
which  causes  another  switch,  etc.   We  define 
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ci  =  ai-l  '  ci-l  +  Vl  •  di-l  >  2  1  i  1  n 


and 


di  =  ai-l   -    ci-l  +  Vl   '   di-l    >   2  1  i  1  n 
as   two   control  words,   where  c     =  1  if  a     is  our   starting  bit   and  d     =  1   if 


b_    is  our  starting  bit. 
Then  we  have 


e.    =  a.    •    c.    +  b.    •    d. 

11111 


The  generation  of  c  =  c   ...  c.  and  d  =  d   ...  d,  can  be  handled 

n      1  n      1 

as  a  coupled  linear  recurrence  system  of  the  form  R<2n,3>  . 

As  a  final  application  of  the  ideas  of  this  paper  we  mention 

digital  filtering.   This  topic  has  received  a  great  deal  of  attention  in 

recent  years.   Our  combinational  results  can  be  applied  to  nonrecursive 

filters  and  our  recurrence  results  can  be  applied  to  recursive  filters  in 

rather  direct  ways.   For  more  details  about  such  filters  see  [ 9  ]  or  [10]. 


h5 


REFERENCES 


[l]       R.  Brent,  D.  Kuck,  and  K.  Maruyama,  "The  Parallel  Evaluation  of 
Arithmetic  Expressions  Without  Division,"  IEEE  Transactions  on 
Computers,  Vol.  C-22,  No.  5,  pp.  532-53*+,  May  1973. 

[2]       S.  C.  Chen,  and  D.  Kuck,  "Time  and  Parallel  Processor  Bounds  for 
Linear  Recurrence  Systems,"   IEEE  Transactions  on  Computers, 
Vol.  C-2U,  No.  7,  pp.  701-717,  July  1975- 

[3]  D.  Kuck,  and  Y.  Muraoka,  "Bounds  on  the  Parallel  Evaluation  of 
Arithmetic  Expressions  Using  Associativity  and  Commutativity ," 
Acta  Informatica,  Vol.  3,  Fasc.  3,  pp.  203-216,  191  h . 

[k]  R.  P.  Brent,  "The  Parallel  Evaluation  of  Arithmetic  Expressions 

in  Logarithmic  Time,"  Complexity  of  Sequential  and  Parallel 
Numerical  Algorithms,  J.  F.  Traub,  ed.,  Academic  Press,  N.Y., 
1973. 

[5]       S.  C.  Chen,  "Speedup  of  Iterative  Programs  in  Multiprocessor 

Systems,"  Ph.D.  thesis,  Univ.  of  111.  at  Urb . -Champ . ,  Dept .  of 
Computer  Science  Report  No.  69k ,  Jan.  1975. 
(NSF  -  OCA  -Gj-36936  -  00000*0. 

[6]       A.  Habibi,  and  P.  A.  Wintz,  "Fast  Multipliers,"  IEEE  Transactions 
on  Computers,  Vol.  C-19,  No.  2,  pp.  153-57,  Feb.  1970.  ' 

[7]       C.  S.  Wallace,  "A  suggestion  for  a  fast  multiplier,"  IEEE  Trans- 
actions on  Electronic  Computers,  Vol.  EC-13 ,  pp.  lU-17,  Feb.  I96U. 

[8]       L.  Dadda,  "Some  schemes  for  parallel  multipliers,"  Alta  Frequenza, 
Vol.  31,  pp.  319-356,  March  1965. 

[9]       W.  D.  Little,  "An  Algorithm  for  High-Speed  Digital  Filters,"  IEEE 

Transactions  on  Computers,  Vol.  C-23,  No.  5,  pp.  U66-U69,  May  197*+. 

10]       L.  B.  Jackson,  J.  F.  Kaiser,  and  H.  S.  McDonald,  "An  Approach  to  the 
Implementation  of  Digital  Filters,"  IEEE  Transactions  on  Audio  and 
Electroacoustics,  Vol.  AU-16,  No.  3,  Sept.  1968. 


BIBLIOGRAPHIC  DATA 
SHEET 


1.    Report   No. 

UIUCDCS-R-75-775 


3.  Recipient's  Accession  N> 


4.    I  n  l<    .inJ   ^uht  itle 

Combinational  Circuit   Synthesis  with  Time   and  Component  Bounds 


5-    Report    Date 

December  1975 


'.  Author(s1 

S.   C.    Chen  and  D.    J.    Kuck 


8.    Performing  Organ  i/.jt  ion   Rcpt. 
No. 


1.  Performing  Organization  Name  and  Address 

University  of  Illinois   at   Urbana-Champaign 
Department   of  Computer  Science 
Urbana,    Illinois      6l801 


10.   Project/Task/Work   Unit   No. 


11.  Contract /Grant  No. 

US  NSF  DCR7 3-07980  A02 


12.  Sponsoring  Organization  Name  and  Address 

National   Science  Foundation 
Washington,   D.    C. 


13.   Type  of  Report  &  Period 
Covered 

Technical  Report 


14. 


5    supplementary  Notes 


6.   Abstracts 

New  results  are  given  concerning  the  design  of  combinational  logic  circuits 
We  give  time  and  component  bounds  for  combinational  circuits  specified  in  several 
ways.   For  any  sequential  machine  defined  by  linear  recurrence  relations,  we  discuss 
an  algorithm  for  the  synthesis  of  equivalent  combinational  logic.   The  procedure 
includes  upper  bounds  on  the  time  and  components  involved.   We  also  discuss  the  trans- 
formation of  nonlinear  recurrences  into  combinational  circuits.   Examples  are  given 
using  gates  as  well  as  ICs  as  components.   These  include  binary  addition,  multipli- 
cation, and  ones'  position  counting.   The  time  and  component  bounds  our  procedure 
yields  compare  favorably  with  traditional  results. 


7.  Key  Words  and  Document  Analysis.     17a.   Descriptors 

3inary  addition 
Binary  multiplication 
Circuit   synthesis 
Combinational  circuits 
Component  bounds 
Sequential  circuits 
Pime  bounds 


'b.    Identifiers  /Open-Ended  Terms 


'c   r.OSATI   Field/Group 


'•Availability  Statement 

Release  Unlimited 


19.  Security  Class  (This 
Report) 

UNCLASSIFIED 


20.  Security  Class  (This 

Page 
UNCLASSIFIED 


21.    No.  of   Pages 

_k8_ 


22.    Price 


"M   N  TIS-35   (10-70) 


USCOMM-DC    40329-P'l 


■ 


3 


